GPU-Acceleration of Sequence Homology Searches with Database Subsequence Clustering
نویسندگان
چکیده
Sequence homology searches are used in various fields and require large amounts of computation time, especially for metagenomic analysis, owing to the large number of queries and the database size. To accelerate computing analyses, graphics processing units (GPUs) are widely used as a low-cost, high-performance computing platform. Therefore, we mapped the time-consuming steps involved in GHOSTZ, which is a state-of-the-art homology search algorithm for protein sequences, onto a GPU and implemented it as GHOSTZ-GPU. In addition, we optimized memory access for GPU calculations and for communication between the CPU and GPU. As per results of the evaluation test involving metagenomic data, GHOSTZ-GPU with 12 CPU threads and 1 GPU was approximately 3.0- to 4.1-fold faster than GHOSTZ with 12 CPU threads. Moreover, GHOSTZ-GPU with 12 CPU threads and 3 GPUs was approximately 5.8- to 7.7-fold faster than GHOSTZ with 12 CPU threads.
منابع مشابه
Faster sequence homology searches by clustering subsequences
MOTIVATION Sequence homology searches are used in various fields. New sequencing technologies produce huge amounts of sequence data, which continuously increase the size of sequence databases. As a result, homology searches require large amounts of computational time, especially for metagenomic analysis. RESULTS We developed a fast homology search method based on database subsequence clusteri...
متن کاملGHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics
BACKGROUND A large number of sensitive homology searches are required for mapping DNA sequence fragments to known protein sequences in public and private databases during metagenomic analysis. BLAST is currently used for this purpose, but its calculation speed is insufficient, especially for analyzing the large quantities of sequence data obtained from a next-generation sequencer. However, fast...
متن کاملLimits of homology detection by pairwise sequence comparison
MOTIVATION Noise in database searches resulting from random sequence similarities increases as the databases expand rapidly. The noise problems are not a technical shortcoming of the database search programs, but a logical consequence of the idea of homology searches. The effect can be observed in simulation experiments. RESULTS We have investigated noise levels in pairwise alignment based da...
متن کاملHybrid Clustering Support Vector Machines by Incorporating Protein Residue Information for Protein Local Structure Prediction
Protein local structure prediction can be described as prediction of protein secondary structure from protein subsequence. This protein subsequence or also known as protein local structure covers fragments of the protein sequence. In fact, it is easier to identify the sequence-to-secondary structure relationship using protein subsequence rather than use the whole protein sequence. Further, this...
متن کاملPrefiltering Model for Homology Detection Algorithms on GPU
Homology detection has evolved over the time from heavy algorithms based on dynamic programming approaches to lightweight alternatives based on different heuristic models. However, the main problem with these algorithms is that they use complex statistical models, which makes it difficult to achieve a relevant speedup and find exact matches with the original results. Thus, their acceleration is...
متن کامل